Multilayer Perceptron Experiments on Sine Wave Part 2#
In the previous experiment, we showed that an MLP with ReLU network cannot extrapolate the sine function. Recent researches showed that ReLU networks tend to extrapolate linearly and so are not fit for extrapolating periodic functions Xu, et al, 2021.
To induce periodic extrapolation bias in neural networks, Ziyin, et al, 2020 proposed a simple activation function called “Snake activation” with the form \(x + \frac{1}{a}sin^2(ax)\) where \(a\) can be treated as a constant hyperparameter or learned parameter.
We’ve experimented on the Snake activation to see if it can fit and extrapolate a simple sine function. We also experimented how Snake activation compares against alternative, but similar-looking activation functions
\(sin(ax)\)
\(sin^2(ax)\)
\(x + \frac{1}{a}sin(ax)\)
Extrapolation Experiment#
We generated synthetic data using sin(x) function, which we aim to learn. The blue colors are the training set which we will use to train our model. The orange colors are the test set which we will use to check if our model can generalize and did learn the sine function
Our inputs are the x-values (horizontal axis) and our targets are y = sin(x), which we will train our model to predict given x.
We show an animation below of the neural network parameters and the evolution of how it fits the data over its training (epochs).
In all experiments, we used
Xavier Normal initialization
Two hidden layers
256 neurons per hidden layer
learning rate = 0.0001
We experimented on different model complexities and see how they fit the data and generalize over the test set.
We find that as the number of nodes per layer increases, while maintaing a fixed N-layers, the ability of the model improves.
Interpolation Experiment#
Now what if we make this an interpolation problem instead of an extrapolation problem. In other words, what if we reverse the train and test set? Will the model be able to infer the sine wave in between the test data? We show below the sine wave colored by train (blue) and test (orange)
How does \(a\) of Snake activation affect the model predictions?#
Take-aways#
We performed an ablation study on how the activation function learns to extrapolate a sine function from a short segment of data.
It looks like Snake activation does perform best compared to alternative, but similar-looking activation functions. The activation with the closest performance to Snake is \(x + \frac{1}{a}sin(ax)\) that doesn’t square the sine.